Add ARJ, ARC, and ACE archive decompression support#194
Add ARJ, ARC, and ACE archive decompression support#194
Conversation
… clone requirement Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Adds native extraction support for ARJ, ARC, and ACE archives (via SharpCompress) to RecursiveExtractor, including MiniMagic type detection and test coverage to validate detection + extraction output counts.
Changes:
- Upgraded SharpCompress to enable ARJ/ARC/ACE reader support.
- Added MiniMagic signatures and new
ArchiveFileTypeenum values for ARJ/ARC/ACE, plus registered new extractors. - Added small fixture archives and expanded unit tests to cover detection and extracted-file counts.
Reviewed changes
Copilot reviewed 10 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| RecursiveExtractor/RecursiveExtractor.csproj | Bumps SharpCompress dependency to a version that supports ARJ/ARC/ACE. |
| RecursiveExtractor/MiniMagic.cs | Adds ARJ/ARC/ACE types and signature detection; increases header read size. |
| RecursiveExtractor/Extractors/ArjExtractor.cs | New ARJ extractor implementation (sync + async). |
| RecursiveExtractor/Extractors/ArcExtractor.cs | New ARC extractor implementation (sync + async), with post-decompression size check. |
| RecursiveExtractor/Extractors/AceExtractor.cs | New ACE extractor implementation (sync + async). |
| RecursiveExtractor/Extractor.cs | Registers ARJ/ARC/ACE extractors in SetDefaultExtractors(). |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arj | Adds ARJ test fixture archive. |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arc | Adds ARC test fixture archive. |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.ace | Adds ACE test fixture archive. |
| RecursiveExtractor.Tests/RecursiveExtractor.Tests.csproj | Ensures new fixture archives are copied to test output. |
| RecursiveExtractor.Tests/ExtractorTests/MiniMagicTests.cs | Adds MiniMagic detection tests for ARJ/ARC/ACE fixtures. |
| RecursiveExtractor.Tests/ExtractorTests/ExpectedNumFilesTests.cs | Adds extraction count expectations for ARJ/ARC/ACE fixtures. |
| .github/copilot-instructions.md | Documents deep-clone requirement for Nerdbank.GitVersioning builds. |
|
@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for |
…e each signature check on bytes actually read Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
SharpCompress 0.44.5 now throws CryptographicException at Open() time for encrypted 7z archives, leaving the stream at a non-zero position. Subsequent password retry calls to Open() then fail with EndOfStreamException. Reset fileEntry.Content.Position = 0 before each retry, matching the pattern already used by RarExtractor and ZipExtractor. Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Good catch — it's not MiniMagic but a side effect of the SharpCompress 0.44.5 upgrade. In the old version, Fixed in 9444f82 by adding |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our expected num files tests should check to make sure that the status isn't set to failed archive. What do you think? |
Three test methods (ExtractArchiveParallel, ExtractArchiveFromStreamAsync, ExtractArchiveFromStream) only verified result count, which could mask extraction failures for single-file archives (ARJ, ARC, ACE) since ExtractSelfOnFail returns 1 result with FailedArchive status. Now all three also assert that no result has FailedArchive status. Co-authored-by: gfs <98900+gfs@users.noreply.github.com>
Great point — with |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 14 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (5)
RecursiveExtractor/Extractors/AceExtractor.cs:136
- The entry stream from
aceReader.OpenEntryStream()is not disposed. SinceFileEntrycopies the stream whenpassthroughStreamis false, dispose the entry stream immediately after constructing theFileEntryto avoid holding reader resources longer than needed.
try
{
governor.CheckResourceGovernor(entry.Size);
var stream = aceReader.OpenEntryStream();
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ACE, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
RecursiveExtractor/Extractors/ArjExtractor.cs:136
- The entry stream from
arjReader.OpenEntryStream()is not disposed. SinceFileEntrycopies the stream whenpassthroughStreamis false, the source stream can be disposed immediately after constructing theFileEntryto avoid holding resources unnecessarily.
{
var stream = arjReader.OpenEntryStream();
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARJ, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
RecursiveExtractor/Extractors/ArcExtractor.cs:138
- The entry stream returned by
arcReader.OpenEntryStream()is never disposed. BecauseFileEntrycopies the contents (passthrough=false), it’s safe to dispose the entry stream immediately after constructingnew FileEntry(...).
try
{
var stream = arcReader.OpenEntryStream();
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARC, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
RecursiveExtractor/MiniMagic.cs:191
- These 'if' statements can be combined.
if (bytesRead >= 4)
{
if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
}
}
RecursiveExtractor/MiniMagic.cs:211
- These 'if' statements can be combined.
if (bytesRead >= 7)
{
if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
{
return ArchiveFileType.RAR;
}
}
… 'using' statements for better resource management
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 14 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (4)
RecursiveExtractor/Extractors/AceExtractor.cs:140
- Same issue in the sync path:
using var stream = aceReader.OpenEntryStream()is followed by{ ... }, which won’t compile. Use ausing (...) { ... }statement or ausing var ...;declaration without a block.
using var stream = aceReader.OpenEntryStream()
{
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ACE, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
RecursiveExtractor/Extractors/ArcExtractor.cs:142
- Same issue in the sync path:
using var stream = arcReader.OpenEntryStream()is written with a following{ ... }block, which won’t compile. Useusing (var stream = ...) { ... }or ausing var ...;declaration without a block.
using var stream = arcReader.OpenEntryStream()
{
var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar);
if (string.IsNullOrEmpty(name))
{
Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARC, fileEntry.FullPath);
continue;
}
newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff);
}
RecursiveExtractor/MiniMagic.cs:191
- These 'if' statements can be combined.
if (bytesRead >= 4)
{
if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
}
}
RecursiveExtractor/MiniMagic.cs:211
- These 'if' statements can be combined.
if (bytesRead >= 7)
{
if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
{
return ArchiveFileType.RAR;
}
}
| } | ||
|
|
||
| governor.CheckResourceGovernor(entry.Size); | ||
| using var entryStream = aceReader.OpenEntryStream() |
There was a problem hiding this comment.
using var entryStream = aceReader.OpenEntryStream() is followed by a block ({ ... }), which is invalid syntax for a using-declaration and will not compile. Convert this to a using (...) { ... } statement or remove the braces and terminate the declaration with ;.
This issue also appears on line 131 of the same file.
| using var entryStream = aceReader.OpenEntryStream() | |
| using (var entryStream = aceReader.OpenEntryStream()) |
| try | ||
| { | ||
| aceReader = AceReader.Open(fileEntry.Content, new ReaderOptions() | ||
| { | ||
| LeaveStreamOpen = true | ||
| }); |
There was a problem hiding this comment.
Reset fileEntry.Content.Position = 0 before calling AceReader.Open(...) (in both async and sync methods). This matches the established extractor pattern (e.g., Zip/GZip) and avoids failures when the stream position isn’t at the beginning.
| var resultsList = extractor.Extract(path, stream, GetExtractorOptions()).ToList(); | ||
| Assert.DoesNotContain(resultsList, r => r.EntryStatus == FileEntryStatus.FailedArchive); | ||
| Assert.Equal(expectedNumFiles, resultsList.Count); | ||
| stream.Close(); |
There was a problem hiding this comment.
Redundant stream.Close() call: the stream is already disposed by the using var stream = new FileStream(...). Removing the explicit close reduces noise and avoids confusion about ownership/lifetime.
| stream.Close(); |
| continue; | ||
| } | ||
|
|
||
| using var entryStream = arcReader.OpenEntryStream() |
There was a problem hiding this comment.
using var entryStream = arcReader.OpenEntryStream() is followed by a block ({ ... }), which is invalid syntax for a using-declaration and will not compile. Change to a using (...) { ... } statement or end the declaration with ; and remove the braces.
This issue also appears on line 133 of the same file.
| using var entryStream = arcReader.OpenEntryStream() | |
| using (var entryStream = arcReader.OpenEntryStream()) |
| using var stream = arjReader.OpenEntryStream() | ||
| { | ||
| var name = entry.Key?.Replace('/', Path.DirectorySeparatorChar); | ||
| if (string.IsNullOrEmpty(name)) | ||
| { | ||
| Logger.Debug(Extractor.ENTRY_MISSING_NAME_ERROR_MESSAGE_STRING, ArchiveFileType.ARJ, fileEntry.FullPath); | ||
| continue; | ||
| } | ||
| newFileEntry = new FileEntry(name, stream, fileEntry, false, entry.CreatedTime, entry.LastModifiedTime, entry.LastAccessedTime, memoryStreamCutoff: options.MemoryStreamCutoff); | ||
| } |
There was a problem hiding this comment.
using var stream = arjReader.OpenEntryStream() is written with a following block ({ ... }), which is invalid C# syntax for a using-declaration and will not compile. Use either a using (...) { ... } statement or a using var stream = ...; declaration (and remove the braces).
| try | ||
| { | ||
| arjReader = ArjReader.Open(fileEntry.Content, new ReaderOptions() | ||
| { | ||
| LeaveStreamOpen = true | ||
| }); |
There was a problem hiding this comment.
Before calling ArjReader.Open(...), reset fileEntry.Content.Position = 0 (and similarly in the sync Extract method). Most other extractors do this to avoid failures when the stream has been read/rewound elsewhere before extraction.
| try | ||
| { | ||
| arcReader = ArcReader.Open(fileEntry.Content, new ReaderOptions() | ||
| { | ||
| LeaveStreamOpen = true | ||
| }); |
There was a problem hiding this comment.
Consider resetting fileEntry.Content.Position = 0 before ArcReader.Open(...) in both ExtractAsync and Extract. Other extractors do this to ensure extraction works even if the stream has been partially read before the extractor runs.
| if (bytesRead >= 3) | ||
| { | ||
| if (buffer[0] == 0x42 && buffer[1] == 0x5A && buffer[2] == 0x68) | ||
| { | ||
| return ArchiveFileType.GZIP; | ||
| return ArchiveFileType.BZIP2; | ||
| } | ||
| } |
There was a problem hiding this comment.
These 'if' statements can be combined.
This issue also appears in the following locations of the same file:
- line 185
- line 205
See below for a potential fix:
if (bytesRead >= 3 && buffer[0] == 0x42 && buffer[1] == 0x5A && buffer[2] == 0x68)
{
return ArchiveFileType.BZIP2;
}
if (bytesRead >= 4 && buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.